Sweet Lift Taxi¶

Sweet Lift Taxi company has collected data on taxi orders at airports. Their aim is to predict the amount of taxi orders for the next hour, in order to allocate more drivers for peak hours. We will build a model with an RMSE lower than 48.


In [1]:
# !pip install --user plotly_express

Import Libraries¶

In [2]:
# Import libraries
import pandas as pd 
import numpy as np
import plotly_express as px 
from matplotlib import pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import AdaBoostRegressor, GradientBoostingRegressor, VotingRegressor, RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as mse, mean_absolute_error as mae, make_scorer
from xgboost import XGBRegressor 
from lightgbm import LGBMRegressor 
import lightgbm as lgb
from catboost import CatBoostRegressor, Pool
In [3]:
# read dataframe
df = pd.read_csv('datasets/taxi.csv', parse_dates=['datetime'], index_col=['datetime'])
In [4]:
# sorting index
df.sort_index(inplace=True)
In [5]:
# checking if index is monotonic
print(df.index.is_monotonic)
True
In [6]:
# look at dataframe
df.head()
Out[6]:
num_orders
datetime
2018-03-01 00:00:00 9
2018-03-01 00:10:00 14
2018-03-01 00:20:00 28
2018-03-01 00:30:00 20
2018-03-01 00:40:00 32
In [7]:
# info on num orders columns
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 26496 entries, 2018-03-01 00:00:00 to 2018-08-31 23:50:00
Data columns (total 1 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   num_orders  26496 non-null  int64
dtypes: int64(1)
memory usage: 414.0 KB
In [8]:
# confirming no missing values
df.isna().sum()
Out[8]:
num_orders    0
dtype: int64
In [9]:
# resample data by the hour
df = df.resample('1H').sum()

We loaded the data and then converted the dates into datetime format. We then made the datetime column our index, and sorted the index. We checked to make sure the data was free of missing values. Following that, we resampled the data by the hour.


EDA¶

In [10]:
# summary statistics on the number of orders
df.describe()
Out[10]:
num_orders
count 4416.000000
mean 84.422781
std 45.023853
min 0.000000
25% 54.000000
50% 78.000000
75% 107.000000
max 462.000000
In [11]:
# hourly number of orders
fig = px.line(df.num_orders, title='Total Hourly Number of Orders', template='ggplot2', height=600, labels={'value': 'Number of Orders'})
fig.update_xaxes(rangeslider_visible=True, 
    rangeselector=dict(
        buttons=list([
            dict(count=1, label='1m', step='month', stepmode='backward'), 
            dict(count=6, label='6m', step='month', stepmode='backward')
            ])
        )
    )
fig.show()
Mar 2018Apr 2018May 2018Jun 2018Jul 2018Aug 20180100200300400
variablenum_orders1m6mTotal Hourly Number of OrdersNumber of Ordersdatetime
plotly-logomark

This is a visual of the timeseries data sampled on the hour. The y axis shows the number of taxi orders.

In [12]:
# distribution of orders
px.box(df.num_orders, title='Distribution of Orders', template='ggplot2', labels={'variable': 'Orders', 'value': 'count'}, height=600)
num_orders0100200300400
Distribution of OrdersOrderscount
plotly-logomark

Here, we have the distribution of the number of orders. We see some outliers with values above 186. The average is 78 orders.

In [13]:
# Average daily orders
numbers = [6, 12, 24, 168, 720] 
for i in numbers:
    px.line(df.num_orders.rolling(i).mean(), title=f'Mean Number of Orders per {i} Hours', template='ggplot2', labels={'value': 'Number of Orders', 'datetime':'Dates'}, height=600).show()
Mar 2018Apr 2018May 2018Jun 2018Jul 2018Aug 201850100150200250
variablenum_ordersMean Number of Orders per 6 HoursDatesNumber of Orders
plotly-logomark
Mar 2018Apr 2018May 2018Jun 2018Jul 2018Aug 201820406080100120140160180200
variablenum_ordersMean Number of Orders per 12 HoursDatesNumber of Orders
plotly-logomark
Mar 2018Apr 2018May 2018Jun 2018Jul 2018Aug 2018406080100120140160180
variablenum_ordersMean Number of Orders per 24 HoursDatesNumber of Orders
plotly-logomark
Mar 2018Apr 2018May 2018Jun 2018Jul 2018Aug 20186080100120140160
variablenum_ordersMean Number of Orders per 168 HoursDatesNumber of Orders
plotly-logomark
Mar 2018Apr 2018May 2018Jun 2018Jul 2018Aug 201860708090100110120130
variablenum_ordersMean Number of Orders per 720 HoursDatesNumber of Orders
plotly-logomark

These visuals show the number of orders resampled for 6 hours, 12 hours, per day, per week, and per month. These visualizations allow us the clearly see the trend in orders increase gradually from April to August.

In [14]:
# decomposed dataset
decomposed = seasonal_decompose(df)
In [15]:
# decomposed trend 
px.line(decomposed.trend, title='Trend', template='ggplot2')
Mar 2018Apr 2018May 2018Jun 2018Jul 2018Aug 2018406080100120140160180
variabletrendTrenddatetimevalue
plotly-logomark
In [16]:
# decomposed seasonality 
px.line(decomposed.seasonal, title='Seasonality', template='ggplot2')
Mar 2018Apr 2018May 2018Jun 2018Jul 2018Aug 2018−60−40−200204060
variableseasonalSeasonalitydatetimevalue
plotly-logomark
In [17]:
# decomposed residual 
px.line(decomposed.resid, title='Residual', template='ggplot2')
Mar 2018Apr 2018May 2018Jun 2018Jul 2018Aug 2018−100−50050100150200250300
variableresidResidualdatetimevalue
plotly-logomark

These visualizations show us the decomposed trends of the data. The trend is the same chart as the resampled daily chart. We see the seasonality chart is tight, which may be due to the short window of time given by the data. The residuals generally fluctuate around 0, but starts to show outliers in August.

In [18]:
# difference in number of orders
fig = px.line(df.num_orders-df.num_orders.shift(), title='Difference in Number of Orders', template='ggplot2', height=700)
fig.update_xaxes(rangeslider_visible=True, 
    rangeselector=dict(
        buttons=list([
            dict(count=1, label='1m', step='month', stepmode='backward'), 
            dict(count=6, label='6m', step='month', stepmode='backward')
            ])
        )
    )
fig.show()
Mar 2018Apr 2018May 2018Jun 2018Jul 2018Aug 2018−200−1000100200
variablenum_orders1m6mDifference in Number of Ordersvaluedatetime
plotly-logomark

Looking at the shifted difference in the number of orders, we see the values increase as time increases. These differences become more pronounced from August onward.


Model Preparation¶

In [19]:
# Making features, max lag 24 and rolling mean 24
def make_features(data, max_lag, rolling_mean_size):
    data['year'] = data.index.year
    data['month'] = data.index.month
    data['day'] = data.index.day
    data['dayofweek'] = data.index.dayofweek
    data['hour'] = data.index.hour

    for lag in range(1, max_lag + 1):
        data['lag_{}'.format(lag)] = data['num_orders'].shift(lag)

    data['rolling_mean'] = (
        data['num_orders'].shift().rolling(rolling_mean_size).mean()
    )


make_features(df, 24, 24)
In [20]:
# splitting dataset to train, valid, and test
train, test = train_test_split(df, shuffle=False, test_size=0.1, random_state=19)
train, valid = train_test_split(train, shuffle=False, test_size=0.11, random_state=19)

print('Train Dataset = ', ' Start : ', train.index.min(), '   End : ', train.index.max(), '   Difference : ', abs(train.index.min() - train.index.max()))
print('Valid Dataset = ', ' Start : ', valid.index.min(), '   End : ', valid.index.max(), '   Difference : ', abs(valid.index.min() - valid.index.max()))
print('Test Dataset = ', ' Start : ', test.index.min(),  '   End : ', test.index.max(), '   Difference : ', abs(test.index.min() - test.index.max()))
Train Dataset =   Start :  2018-03-01 00:00:00    End :  2018-07-26 07:00:00    Difference :  147 days 07:00:00
Valid Dataset =   Start :  2018-07-26 08:00:00    End :  2018-08-13 13:00:00    Difference :  18 days 05:00:00
Test Dataset =   Start :  2018-08-13 14:00:00    End :  2018-08-31 23:00:00    Difference :  18 days 09:00:00
In [21]:
# Dropping missing values from datasets 
train = train.dropna()
valid = valid.dropna() 
test = test.dropna()
In [22]:
# Splitting target and features
X_train = train.drop(columns='num_orders')
y_train = train.num_orders

X_valid = valid.drop(columns='num_orders')
y_valid = valid.num_orders 

X_test = test.drop(columns='num_orders')
y_test = test.num_orders
In [23]:
features = df.drop(columns='num_orders')
target = df.num_orders
In [24]:
# visual of train test split
fig, ax = plt.subplots(figsize=(25,15))
train.num_orders.plot(ax=ax, label='Training Set', title='Data Train/Test Split')
valid.num_orders.plot(ax=ax, label='Valid Set', color='green')
test.num_orders.plot(ax=ax, label='Test Set')
ax.axvline('2018-07-26 08:00:00', color='black', ls='--')
ax.axvline('2018-08-13 14:00:00', color='black', ls='--')
ax.legend(['Training Set', 'Test Set'])
plt.show()

We define a function to make features, with a max lag of 24 and a rolling mean of 24. We then split the data into three parts: train, valid, and test. We will train the models with the training data, then tune the models with the validation set. The test set is reserved for evaluating the performance of the final model we choose. Since it is crucial to have adequate training data, we limited the validation and test sets to 10% of the data each. This leaves roughly 80% of the data for training. Furthermore, with time series, we can not randomly select points in our data to split, so shuffle was set to false. This gives us the correct sequence in the order of the different sets, made evident by the last figure.


Modeling¶

Linear Regression¶

In [25]:
# linear regression
lr = LinearRegression() # initialize model constructor
lr.fit(X_train, y_train) # train model on training set

predictions_valid_lr = lr.predict(X_valid) # get model predictions on validation set

result = mse(y_valid, predictions_valid_lr) ** 0.5 # calculate RMSE on validation set
print("RMSE of the linear regression model on the validation set:", result)
RMSE of the linear regression model on the validation set: 34.282142440117
In [26]:
# Get the feature importances of the best model
lr_importances = lr.coef_

# Create a dataframe with the feature importances and the corresponding feature names
lr_importances_df = pd.DataFrame({'feature':X_train.columns, 'coefficients':lr.coef_})

# Sort the dataframe by importance
lr_importances_df.sort_values(by='coefficients', ascending=False, inplace=True)
lr_importances_df
Out[26]:
feature coefficients
29 rolling_mean 3.443462e+12
1 month 3.741096e+00
4 hour 6.812351e-01
2 day 1.919355e-01
0 year -1.203887e-15
3 dayofweek -2.726262e-01
28 lag_24 -1.434776e+11
5 lag_1 -1.434776e+11
12 lag_8 -1.434776e+11
6 lag_2 -1.434776e+11
27 lag_23 -1.434776e+11
11 lag_7 -1.434776e+11
15 lag_11 -1.434776e+11
20 lag_16 -1.434776e+11
17 lag_13 -1.434776e+11
26 lag_22 -1.434776e+11
9 lag_5 -1.434776e+11
16 lag_12 -1.434776e+11
21 lag_17 -1.434776e+11
7 lag_3 -1.434776e+11
8 lag_4 -1.434776e+11
14 lag_10 -1.434776e+11
10 lag_6 -1.434776e+11
25 lag_21 -1.434776e+11
18 lag_14 -1.434776e+11
19 lag_15 -1.434776e+11
22 lag_18 -1.434776e+11
23 lag_19 -1.434776e+11
24 lag_20 -1.434776e+11
13 lag_9 -1.434776e+11

The rolling mean is the coefficient with the highest value among the linear regression features. We achieve an RMSE score of 34.28 with the validation set.

Decision Tree¶

In [27]:
# Decision Tree
best_model = None
best_result = 50
best_depth = 0
for depth in range(1, 6): # choose hyperparameter range
    dtr = DecisionTreeRegressor(random_state=19, max_depth=depth)
    dtr.fit(X_train, y_train) # train model on training set
    predictions_valid_dtr = dtr.predict(X_valid) # get model predictions on validation set
    result = mse(y_valid, predictions_valid_dtr) ** 0.5
    if result < best_result:
        best_model = dtr
        best_result = result
        best_depth = depth

print(f"RMSE of the best model on the validation set (max_depth = {best_depth}): {best_result}")
RMSE of the best model on the validation set (max_depth = 5): 38.54337041017864
In [28]:
# Get the feature importances of the best model
dtr_importances = dtr.feature_importances_

# Create a dataframe with the feature importances and the corresponding feature names
dtr_importances_df = pd.DataFrame({'feature':X_train.columns, 'importance':dtr.feature_importances_})

# Sort the dataframe by importance
dtr_importances_df.sort_values(by='importance', ascending=False, inplace=True)
In [29]:
# top 10 feature importances
px.pie(dtr_importances_df.head(10), names='feature', values='importance', title='Top 10 Feature Importance for Decision Tree Regression', template='ggplot2', hole=0.2)
64%15%12%2.39%1.37%1.28%1.27%1.15%0.992%0.586%
lag_24lag_1hourlag_3lag_2lag_6lag_23rolling_meanlag_7lag_20Top 10 Feature Importance for Decision Tree Regression
plotly-logomark

The lag 24 feature is the most important in the decesion tree regression. We achieve an RMSE score of 38.54 with the validation set.

Random Forest¶

In [30]:
# Random Forest 
best_model = None
best_result = 50
best_est = 0
best_depth = 0
for est in range(600, 601):
    for depth in range (100, 101):
        rf = RandomForestRegressor(random_state=19, n_estimators=est, max_depth=depth)
        rf.fit(X_train, y_train) # train model on training set
        predictions_valid = rf.predict(X_valid) # get model predictions on validation set
        result = mse(y_valid, predictions_valid) ** 0.5 # calculate RMSE on validation set
        if result < best_result:
            best_model = rf
            best_result = result
            best_est = est
            best_depth = depth
            
print("RMSE of the best model on the validation set:", best_result, "n_estimators:", best_est, "best_depth:", depth)
RMSE of the best model on the validation set: 32.013650992753774 n_estimators: 600 best_depth: 100
In [31]:
# Get the feature importances of the best model
rf_importances = best_model.feature_importances_

# Create a dataframe with the feature importances and the corresponding feature names
rf_importances_df = pd.DataFrame({'feature':X_train.columns, 'importance':rf.feature_importances_})

# Sort the dataframe by importance
rf_importances_df.sort_values(by='importance', ascending=False, inplace=True)
In [32]:
# top 10 feature importances
px.pie(rf_importances_df.head(10), names='feature', values='importance', title='Top 10 Feature Importance for Random Forest Regression', template='ggplot2', hole=0.2)
54.5%12.8%8.99%4.85%4.1%3.19%3.18%2.84%2.84%2.7%
lag_24lag_1hourlag_2lag_7lag_3lag_23rolling_meanlag_6lag_12Top 10 Feature Importance for Random Forest Regression
plotly-logomark

The Lag 24 feature has the greatest importance in the random forest model. The RMSE score is 32.01.

Ada Boost¶

In [33]:
# ADA Boost
regr = AdaBoostRegressor(random_state=19, n_estimators=100)
regr.fit(X_train, y_train)  

predictions_valid_regr = regr.predict(X_valid) # get model predictions on validation set

result = mse(y_valid, predictions_valid_regr) ** 0.5 # calculate RMSE on validation set
print("RMSE of the ada boost regression model on the validation set:", result)
RMSE of the ada boost regression model on the validation set: 34.89293536381952
In [34]:
# table of feature importance
regr_imp = [t for t in zip(features, regr.feature_importances_)]
regr_imp_df = pd.DataFrame(regr_imp, columns=['feature', 'varimp'])
regr_imp_df = regr_imp_df.sort_values('varimp', ascending=False)
In [35]:
# top 10 feature importances
px.pie(regr_imp_df.head(10), names='feature', values='varimp', title='Top 10 Feature Importance for Ada Boost Regresion', hole=.2, template='ggplot2')
25.4%14.6%10.3%8.77%8.25%7.98%7.65%7.62%5.88%3.46%
lag_24hourlag_22lag_17lag_1lag_23lag_8lag_2lag_7lag_12Top 10 Feature Importance for Ada Boost Regresion
plotly-logomark

The lag 24 feature is the most important, followed by the lag 1, among the Ada Boost model. The RMSE score is 34.89.

Gradient Boosting¶

In [36]:
# Gradient Boost
gbr = GradientBoostingRegressor(random_state=19, learning_rate=0.2, n_estimators=1000, verbose=100, max_depth=3)
gbr.fit(X_train, y_train)

predictions_valid_gbr = gbr.predict(X_valid)

result = mse(y_valid, predictions_valid_gbr) ** 0.5 # calculate RMSE on validation set
print("RMSE of the gradient boosting model on the validation set:", result)
      Iter       Train Loss   Remaining Time 
         1        1055.7262           56.79s
         2         902.6644           56.73s
         3         800.7340           58.67s
         4         723.8976           55.13s
         5         664.3207           56.17s
         6         624.8460           57.00s
         7         593.0704           58.01s
         8         561.8075           54.66s
         9         540.4896           53.81s
        10         523.9846           54.70s
        11         507.7131           52.73s
        12         496.1685           51.32s
        13         485.1000           50.20s
        14         477.5934           48.89s
        15         468.6034           47.74s
        16         460.8223           49.19s
        17         452.4853           48.90s
        18         446.5326           48.48s
        19         442.2676           48.30s
        20         433.9193           47.45s
        21         430.5671           46.77s
        22         423.8965           46.20s
        23         422.1899           45.54s
        24         417.2277           45.18s
        25         412.5752           44.69s
        26         408.2447           44.12s
        27         405.0043           43.60s
        28         401.5569           43.03s
        29         396.6786           42.54s
        30         394.4030           42.05s
        31         391.1918           41.84s
        32         387.1093           41.48s
        33         385.1414           41.09s
        34         384.1328           40.72s
        35         381.3070           40.50s
        36         378.0224           40.35s
        37         375.9459           40.08s
        38         373.1744           39.92s
        39         370.8267           39.62s
        40         367.8561           39.42s
        41         366.4715           39.14s
        42         362.9289           38.90s
        43         360.9791           38.69s
        44         360.2535           38.57s
        45         358.7802           38.39s
        46         355.6654           38.16s
        47         353.4201           38.50s
        48         350.8707           39.03s
        49         348.9316           39.49s
        50         347.6763           40.06s
        51         345.4044           40.85s
        52         342.4857           41.47s
        53         341.0666           41.60s
        54         338.8986           41.92s
        55         337.3331           42.35s
        56         334.7584           42.53s
        57         333.3173           42.73s
        58         331.8567           43.00s
        59         329.4319           42.92s
        60         327.1108           42.65s
        61         324.1376           43.00s
        62         321.6674           43.67s
        63         320.2303           44.15s
        64         318.8377           44.64s
        65         318.4357           44.58s
        66         317.2843           44.66s
        67         316.8266           44.76s
        68         315.5911           44.68s
        69         314.0183           44.71s
        70         313.1352           44.66s
        71         312.0731           44.61s
        72         310.9500           44.60s
        73         309.7945           44.50s
        74         308.2055           44.47s
        75         307.3895           44.41s
        76         305.6352           44.36s
        77         303.2277           44.34s
        78         300.9343           44.30s
        79         299.6684           44.31s
        80         299.1902           44.44s
        81         298.0853           44.35s
        82         296.1359           44.42s
        83         294.5330           44.25s
        84         293.1980           44.35s
        85         291.5203           44.30s
        86         289.6140           44.12s
        87         288.9266           43.89s
        88         287.4763           43.88s
        89         286.4075           44.03s
        90         284.5592           43.88s
        91         284.3237           43.97s
        92         283.3294           43.83s
        93         282.0784           43.96s
        94         281.5397           43.95s
        95         280.3133           43.87s
        96         278.8913           43.88s
        97         277.4948           43.79s
        98         276.9515           43.76s
        99         275.0107           43.69s
       100         273.5208           43.68s
       101         272.7373           43.57s
       102         271.6324           43.58s
       103         270.9054           43.42s
       104         269.5380           43.48s
       105         269.1883           43.35s
       106         267.8702           43.26s
       107         266.5946           43.25s
       108         266.1640           43.25s
       109         264.6132           43.13s
       110         262.9159           42.93s
       111         262.1642           42.93s
       112         261.4556           42.81s
       113         260.7288           42.77s
       114         259.3995           42.67s
       115         258.4280           42.66s
       116         257.4415           42.59s
       117         256.9665           42.58s
       118         256.4430           42.49s
       119         255.2727           42.39s
       120         254.0211           42.42s
       121         253.1970           42.31s
       122         252.0520           42.34s
       123         251.8655           42.16s
       124         251.0359           42.07s
       125         250.1147           42.07s
       126         248.8988           42.03s
       127         246.8937           42.10s
       128         245.9952           42.09s
       129         244.7595           42.00s
       130         243.4576           42.20s
       131         241.8065           42.25s
       132         241.4891           42.26s
       133         240.0669           42.09s
       134         239.0404           42.07s
       135         238.7721           42.01s
       136         237.4362           41.97s
       137         236.7729           41.95s
       138         235.4824           41.88s
       139         234.7351           41.84s
       140         233.7724           41.82s
       141         232.7002           41.77s
       142         231.4782           41.72s
       143         229.6106           41.72s
       144         229.2437           41.76s
       145         228.9692           41.60s
       146         228.0804           41.52s
       147         227.4072           41.50s
       148         226.6947           41.45s
       149         225.9220           41.51s
       150         225.6887           41.43s
       151         225.0323           41.44s
       152         223.4343           41.40s
       153         223.2424           41.27s
       154         223.0911           41.30s
       155         222.0542           41.19s
       156         220.5953           41.20s
       157         219.6860           41.08s
       158         218.6921           41.08s
       159         217.7972           40.97s
       160         216.9563           40.97s
       161         215.9278           40.87s
       162         214.5232           40.73s
       163         213.3469           40.76s
       164         212.3832           40.73s
       165         211.7076           40.68s
       166         210.2515           40.65s
       167         209.9564           40.60s
       168         208.7963           40.55s
       169         207.9157           40.53s
       170         207.6545           40.46s
       171         207.2754           40.44s
       172         206.1688           40.40s
       173         205.9700           40.38s
       174         204.6435           40.32s
       175         203.6276           40.33s
       176         203.1619           40.32s
       177         202.2022           40.26s
       178         201.4156           40.22s
       179         201.2978           40.14s
       180         200.2885           40.11s
       181         200.1798           40.03s
       182         199.3669           39.99s
       183         198.3831           39.95s
       184         197.4131           39.88s
       185         196.5684           39.77s
       186         195.4207           39.77s
       187         194.5461           39.68s
       188         193.9176           39.66s
       189         192.9804           39.58s
       190         192.2195           39.56s
       191         191.7909           39.49s
       192         191.0509           39.45s
       193         190.5408           39.43s
       194         189.7018           39.35s
       195         188.9814           39.33s
       196         188.6125           39.25s
       197         188.1825           39.22s
       198         187.6218           39.14s
       199         186.9988           39.11s
       200         186.1205           39.03s
       201         185.8713           38.99s
       202         185.4968           38.93s
       203         184.6223           38.89s
       204         183.9367           38.82s
       205         183.4210           38.78s
       206         182.8387           38.84s
       207         182.5824           38.84s
       208         181.9990           38.79s
       209         181.5682           38.68s
       210         180.8851           38.77s
       211         180.0577           38.80s
       212         179.3731           38.81s
       213         178.8296           38.75s
       214         178.4575           38.79s
       215         178.3745           38.76s
       216         178.1820           38.78s
       217         177.5213           38.82s
       218         177.0981           38.85s
       219         176.4036           38.86s
       220         175.7206           38.80s
       221         175.2264           38.70s
       222         174.7505           38.61s
       223         174.1359           38.51s
       224         173.6011           38.40s
       225         172.8501           38.29s
       226         172.1924           38.19s
       227         171.7496           38.09s
       228         171.2453           38.06s
       229         170.7370           38.03s
       230         170.2981           37.93s
       231         169.7500           37.84s
       232         169.4488           37.73s
       233         168.7189           37.63s
       234         168.0393           37.52s
       235         167.0320           37.43s
       236         166.8765           37.32s
       237         166.2960           37.22s
       238         165.7055           37.13s
       239         165.3356           37.02s
       240         165.1666           36.92s
       241         164.3345           36.83s
       242         163.6705           36.76s
       243         163.4298           36.67s
       244         163.1072           36.58s
       245         162.8726           36.48s
       246         162.4190           36.40s
       247         162.1065           36.31s
       248         162.0323           36.23s
       249         161.2361           36.18s
       250         160.9223           36.09s
       251         160.7824           35.99s
       252         160.6567           35.89s
       253         160.0059           35.79s
       254         159.5545           35.69s
       255         158.6141           35.59s
       256         157.6411           35.49s
       257         156.8622           35.40s
       258         156.3643           35.32s
       259         155.6960           35.27s
       260         154.8697           35.20s
       261         154.3736           35.11s
       262         154.2928           35.02s
       263         154.1702           34.92s
       264         153.9868           34.82s
       265         153.3099           34.73s
       266         153.0980           34.63s
       267         152.6246           34.54s
       268         152.1311           34.45s
       269         151.5465           34.35s
       270         151.1301           34.27s
       271         150.7190           34.21s
       272         150.1633           34.12s
       273         149.4289           34.04s
       274         148.7573           33.96s
       275         148.2664           33.88s
       276         147.7297           33.78s
       277         147.2369           33.69s
       278         146.6337           33.61s
       279         146.0800           33.52s
       280         145.5741           33.43s
       281         144.8494           33.34s
       282         144.1090           33.26s
       283         143.8503           33.17s
       284         143.6853           33.08s
       285         143.3408           32.99s
       286         143.2404           32.90s
       287         143.1026           32.81s
       288         142.7623           32.73s
       289         142.3508           32.65s
       290         141.6417           32.57s
       291         140.9979           32.49s
       292         140.3641           32.40s
       293         139.7470           32.31s
       294         139.1171           32.23s
       295         138.8241           32.14s
       296         138.3002           32.04s
       297         137.7256           31.95s
       298         137.4539           31.87s
       299         137.2290           31.79s
       300         136.8925           31.70s
       301         136.3540           31.62s
       302         135.8399           31.53s
       303         135.6445           31.45s
       304         135.5275           31.37s
       305         134.9971           31.29s
       306         134.4919           31.20s
       307         133.9457           31.11s
       308         133.5933           31.02s
       309         133.5518           30.93s
       310         133.3521           30.84s
       311         133.0897           30.76s
       312         132.5042           30.68s
       313         132.1678           30.59s
       314         131.3247           30.50s
       315         130.7066           30.42s
       316         130.0524           30.34s
       317         129.8201           30.27s
       318         129.2236           30.19s
       319         128.9383           30.11s
       320         128.5765           30.03s
       321         128.1937           29.95s
       322         127.9376           29.86s
       323         127.3620           29.77s
       324         127.1299           29.68s
       325         126.8163           29.60s
       326         126.2178           29.51s
       327         125.8056           29.41s
       328         125.4083           29.34s
       329         124.8340           29.27s
       330         124.3252           29.19s
       331         123.9439           29.11s
       332         123.7074           29.04s
       333         123.1997           28.96s
       334         122.8533           28.86s
       335         122.7404           28.76s
       336         122.1975           28.69s
       337         121.7528           28.60s
       338         121.2797           28.50s
       339         120.8439           28.43s
       340         120.4679           28.34s
       341         119.9809           28.28s
       342         119.7189           28.20s
       343         119.2674           28.12s
       344         119.1212           28.05s
       345         118.8957           27.97s
       346         118.5084           27.90s
       347         118.0651           27.82s
       348         117.9895           27.76s
       349         117.9291           27.73s
       350         117.5767           27.71s
       351         117.3512           27.64s
       352         117.0751           27.57s
       353         116.8393           27.51s
       354         116.7934           27.43s
       355         116.4581           27.35s
       356         116.4185           27.28s
       357         116.0682           27.21s
       358         115.7497           27.15s
       359         115.6748           27.07s
       360         115.1859           27.01s
       361         114.8149           26.94s
       362         114.4470           26.88s
       363         114.1238           26.81s
       364         113.7294           26.73s
       365         113.1831           26.66s
       366         113.0356           26.58s
       367         112.6315           26.51s
       368         112.0882           26.44s
       369         111.6892           26.36s
       370         111.3174           26.29s
       371         111.0084           26.21s
       372         110.6746           26.14s
       373         110.3062           26.15s
       374         109.9633           26.11s
       375         109.5230           26.06s
       376         109.0951           25.99s
       377         108.7454           25.93s
       378         108.3521           25.87s
       379         107.9858           25.79s
       380         107.8854           25.72s
       381         107.7337           25.65s
       382         107.4410           25.58s
       383         106.9430           25.51s
       384         106.7746           25.44s
       385         106.6139           25.38s
       386         106.1896           25.32s
       387         105.7738           25.25s
       388         105.5081           25.17s
       389         105.3299           25.10s
       390         105.0266           25.03s
       391         104.7402           24.96s
       392         104.4191           24.89s
       393         104.0869           24.83s
       394         103.8967           24.77s
       395         103.4883           24.71s
       396         103.2495           24.64s
       397         103.1218           24.57s
       398         102.7776           24.50s
       399         102.3275           24.43s
       400         102.2159           24.36s
       401         102.0001           24.30s
       402         101.6897           24.22s
       403         101.4587           24.15s
       404         101.4068           24.09s
       405         101.1238           24.02s
       406         100.9377           23.95s
       407         100.5779           23.88s
       408         100.2145           23.81s
       409          99.8983           23.74s
       410          99.5763           23.67s
       411          99.3066           23.61s
       412          98.9422           23.54s
       413          98.7017           23.48s
       414          98.5452           23.43s
       415          98.1847           23.36s
       416          97.9476           23.30s
       417          97.8798           23.23s
       418          97.6242           23.16s
       419          97.3896           23.07s
       420          97.3446           23.02s
       421          96.9197           22.97s
       422          96.6910           22.89s
       423          96.6509           22.83s
       424          96.4408           22.77s
       425          96.0325           22.71s
       426          95.8514           22.65s
       427          95.5637           22.59s
       428          95.3167           22.52s
       429          95.1737           22.47s
       430          94.8804           22.42s
       431          94.6497           22.35s
       432          94.5731           22.28s
       433          94.5448           22.23s
       434          94.3330           22.17s
       435          94.0629           22.11s
       436          93.7731           22.05s
       437          93.6598           21.99s
       438          93.6135           21.93s
       439          93.3674           21.87s
       440          93.1153           21.80s
       441          93.0497           21.73s
       442          92.9693           21.68s
       443          92.9287           21.61s
       444          92.6099           21.55s
       445          92.3089           21.50s
       446          92.0667           21.43s
       447          91.7630           21.38s
       448          91.5105           21.33s
       449          91.3078           21.27s
       450          90.9470           21.20s
       451          90.5933           21.13s
       452          90.4383           21.08s
       453          90.2705           21.02s
       454          90.0625           20.95s
       455          89.7902           20.90s
       456          89.6502           20.84s
       457          89.5671           20.77s
       458          89.4432           20.73s
       459          89.3882           20.68s
       460          89.0465           20.63s
       461          88.8054           20.57s
       462          88.5735           20.51s
       463          88.1279           20.46s
       464          87.8351           20.40s
       465          87.5551           20.33s
       466          87.3919           20.29s
       467          87.2585           20.22s
       468          87.0797           20.18s
       469          86.7864           20.12s
       470          86.4725           20.07s
       471          86.2633           20.01s
       472          85.9592           19.97s
       473          85.7061           19.92s
       474          85.5653           19.86s
       475          85.1883           19.81s
       476          84.9246           19.77s
       477          84.5910           19.71s
       478          84.3501           19.66s
       479          84.2046           19.60s
       480          83.9294           19.54s
       481          83.7545           19.48s
       482          83.4281           19.42s
       483          83.1932           19.37s
       484          82.9807           19.33s
       485          82.8612           19.27s
       486          82.6281           19.21s
       487          82.3459           19.16s
       488          82.3077           19.10s
       489          82.1382           19.04s
       490          81.9576           18.98s
       491          81.6893           18.94s
       492          81.4510           18.89s
       493          81.0863           18.83s
       494          80.8377           18.77s
       495          80.6025           18.71s
       496          80.3971           18.66s
       497          80.2032           18.61s
       498          80.0493           18.55s
       499          79.8719           18.50s
       500          79.6783           18.44s
       501          79.5657           18.40s
       502          79.4650           18.34s
       503          79.1251           18.30s
       504          78.9823           18.24s
       505          78.9140           18.20s
       506          78.5656           18.13s
       507          78.5039           18.09s
       508          78.2727           18.03s
       509          78.0015           17.99s
       510          77.8739           17.94s
       511          77.8320           17.88s
       512          77.5516           17.84s
       513          77.3552           17.78s
       514          77.2392           17.74s
       515          76.9273           17.68s
       516          76.6854           17.64s
       517          76.4324           17.60s
       518          76.3278           17.54s
       519          76.1334           17.50s
       520          75.8134           17.45s
       521          75.7904           17.41s
       522          75.5370           17.35s
       523          75.2642           17.31s
       524          74.9648           17.26s
       525          74.7090           17.22s
       526          74.6401           17.16s
       527          74.5959           17.12s
       528          74.2849           17.07s
       529          74.0021           17.01s
       530          73.8100           16.97s
       531          73.5809           16.92s
       532          73.2991           16.86s
       533          73.0454           16.81s
       534          72.8420           16.77s
       535          72.6681           16.72s
       536          72.4411           16.66s
       537          72.1631           16.63s
       538          71.9370           16.57s
       539          71.8100           16.52s
       540          71.6491           16.47s
       541          71.5069           16.43s
       542          71.2218           16.37s
       543          71.0776           16.32s
       544          70.9463           16.28s
       545          70.6818           16.23s
       546          70.6452           16.18s
       547          70.4464           16.14s
       548          70.2820           16.10s
       549          69.9654           16.05s
       550          69.9267           16.01s
       551          69.7929           15.96s
       552          69.6631           15.91s
       553          69.4370           15.87s
       554          69.2039           15.83s
       555          69.0567           15.78s
       556          68.8276           15.74s
       557          68.5376           15.69s
       558          68.3891           15.65s
       559          68.1473           15.60s
       560          67.9442           15.55s
       561          67.8111           15.51s
       562          67.7899           15.46s
       563          67.6026           15.42s
       564          67.4991           15.37s
       565          67.3599           15.33s
       566          67.1283           15.28s
       567          66.9311           15.24s
       568          66.7955           15.19s
       569          66.5001           15.14s
       570          66.4032           15.11s
       571          66.3060           15.06s
       572          66.1104           15.02s
       573          65.9159           14.97s
       574          65.7521           14.93s
       575          65.7150           14.88s
       576          65.4147           14.83s
       577          65.1236           14.80s
       578          64.9326           14.75s
       579          64.7653           14.70s
       580          64.6209           14.66s
       581          64.5910           14.61s
       582          64.5581           14.58s
       583          64.4211           14.54s
       584          64.1944           14.49s
       585          63.9831           14.45s
       586          63.9229           14.41s
       587          63.7253           14.36s
       588          63.4544           14.32s
       589          63.3722           14.27s
       590          63.3530           14.23s
       591          63.0552           14.19s
       592          62.9333           14.15s
       593          62.7625           14.10s
       594          62.6070           14.05s
       595          62.3913           14.01s
       596          62.0937           13.97s
       597          61.8982           13.92s
       598          61.7319           13.89s
       599          61.5614           13.84s
       600          61.4060           13.80s
       601          61.2164           13.76s
       602          61.0710           13.72s
       603          60.9116           13.68s
       604          60.8973           13.63s
       605          60.7552           13.59s
       606          60.5312           13.55s
       607          60.3324           13.52s
       608          60.0307           13.47s
       609          59.9659           13.43s
       610          59.8900           13.39s
       611          59.7535           13.35s
       612          59.5479           13.31s
       613          59.3687           13.26s
       614          59.1659           13.21s
       615          59.0192           13.17s
       616          58.9590           13.13s
       617          58.7784           13.09s
       618          58.5905           13.04s
       619          58.3234           13.00s
       620          58.1259           12.95s
       621          57.8951           12.92s
       622          57.7173           12.87s
       623          57.5178           12.83s
       624          57.3506           12.78s
       625          57.1682           12.75s
       626          57.0012           12.71s
       627          56.8050           12.67s
       628          56.6516           12.62s
       629          56.5408           12.58s
       630          56.3357           12.54s
       631          56.1726           12.50s
       632          55.9900           12.45s
       633          55.7686           12.42s
       634          55.5224           12.37s
       635          55.3580           12.33s
       636          55.2401           12.29s
       637          55.1066           12.25s
       638          54.9361           12.20s
       639          54.7496           12.17s
       640          54.5293           12.12s
       641          54.4655           12.08s
       642          54.3074           12.05s
       643          54.0834           12.01s
       644          53.9212           11.97s
       645          53.7599           11.94s
       646          53.5608           11.89s
       647          53.4260           11.86s
       648          53.3616           11.81s
       649          53.2499           11.77s
       650          53.0530           11.74s
       651          52.9704           11.69s
       652          52.8286           11.65s
       653          52.7410           11.62s
       654          52.6841           11.57s
       655          52.6646           11.53s
       656          52.5453           11.50s
       657          52.3329           11.45s
       658          52.1643           11.41s
       659          52.0198           11.37s
       660          51.8101           11.33s
       661          51.7218           11.29s
       662          51.6232           11.26s
       663          51.5146           11.21s
       664          51.3580           11.18s
       665          51.3231           11.14s
       666          51.2168           11.10s
       667          51.0286           11.06s
       668          50.7740           11.02s
       669          50.5981           10.99s
       670          50.4105           10.94s
       671          50.2143           10.90s
       672          50.0306           10.86s
       673          49.9703           10.83s
       674          49.8605           10.78s
       675          49.6516           10.74s
       676          49.5213           10.71s
       677          49.4144           10.67s
       678          49.2888           10.63s
       679          49.1202           10.59s
       680          48.9235           10.56s
       681          48.7066           10.52s
       682          48.6437           10.48s
       683          48.6177           10.44s
       684          48.5253           10.40s
       685          48.3849           10.36s
       686          48.2448           10.33s
       687          48.1580           10.29s
       688          48.0233           10.25s
       689          47.9009           10.22s
       690          47.8469           10.18s
       691          47.7222           10.13s
       692          47.5874           10.10s
       693          47.4416           10.06s
       694          47.2562           10.03s
       695          47.1384            9.99s
       696          46.9465            9.95s
       697          46.8536            9.92s
       698          46.7515            9.88s
       699          46.6362            9.84s
       700          46.4215            9.80s
       701          46.3320            9.77s
       702          46.2938            9.73s
       703          46.1982            9.70s
       704          46.0288            9.66s
       705          45.9857            9.62s
       706          45.8746            9.59s
       707          45.7450            9.55s
       708          45.5674            9.51s
       709          45.4179            9.48s
       710          45.2309            9.44s
       711          45.1916            9.41s
       712          45.0643            9.38s
       713          45.0541            9.34s
       714          45.0153            9.31s
       715          45.0060            9.27s
       716          44.9672            9.23s
       717          44.8673            9.20s
       718          44.8333            9.16s
       719          44.7194            9.12s
       720          44.6010            9.09s
       721          44.4671            9.06s
       722          44.2916            9.02s
       723          44.0836            8.99s
       724          43.9025            8.95s
       725          43.7658            8.92s
       726          43.6922            8.88s
       727          43.5902            8.85s
       728          43.4491            8.81s
       729          43.3530            8.78s
       730          43.2013            8.75s
       731          43.0265            8.71s
       732          42.9327            8.67s
       733          42.8232            8.64s
       734          42.6374            8.61s
       735          42.5717            8.57s
       736          42.4095            8.54s
       737          42.2902            8.51s
       738          42.2453            8.47s
       739          42.1743            8.43s
       740          42.1129            8.40s
       741          42.0755            8.36s
       742          41.9594            8.33s
       743          41.8395            8.30s
       744          41.7572            8.26s
       745          41.7133            8.23s
       746          41.6875            8.19s
       747          41.5919            8.16s
       748          41.4996            8.12s
       749          41.4349            8.09s
       750          41.3427            8.05s
       751          41.2793            8.02s
       752          41.2496            7.98s
       753          41.1273            7.95s
       754          41.0416            7.92s
       755          41.0218            7.88s
       756          40.9966            7.85s
       757          40.9027            7.81s
       758          40.8840            7.77s
       759          40.7501            7.74s
       760          40.6823            7.70s
       761          40.6609            7.67s
       762          40.5430            7.64s
       763          40.5122            7.60s
       764          40.3722            7.56s
       765          40.1975            7.53s
       766          40.0731            7.49s
       767          39.9824            7.45s
       768          39.8471            7.42s
       769          39.7286            7.39s
       770          39.5562            7.35s
       771          39.5144            7.31s
       772          39.3907            7.28s
       773          39.2828            7.24s
       774          39.2567            7.21s
       775          39.2142            7.17s
       776          39.1119            7.14s
       777          38.9896            7.11s
       778          38.8926            7.07s
       779          38.8306            7.04s
       780          38.7102            7.01s
       781          38.5614            6.97s
       782          38.4628            6.93s
       783          38.3710            6.90s
       784          38.3318            6.87s
       785          38.1993            6.83s
       786          38.1081            6.80s
       787          38.0319            6.76s
       788          37.9281            6.72s
       789          37.8637            6.69s
       790          37.7878            6.66s
       791          37.6489            6.62s
       792          37.5946            6.59s
       793          37.4523            6.55s
       794          37.3201            6.52s
       795          37.2878            6.48s
       796          37.1762            6.45s
       797          37.0998            6.41s
       798          37.0179            6.38s
       799          36.9978            6.34s
       800          36.8524            6.31s
       801          36.7668            6.27s
       802          36.6426            6.24s
       803          36.5136            6.21s
       804          36.4533            6.17s
       805          36.4447            6.14s
       806          36.3042            6.11s
       807          36.2912            6.07s
       808          36.2097            6.04s
       809          36.1435            6.00s
       810          36.0625            5.97s
       811          35.9498            5.94s
       812          35.9094            5.90s
       813          35.7931            5.87s
       814          35.6817            5.84s
       815          35.5356            5.81s
       816          35.4242            5.77s
       817          35.3029            5.74s
       818          35.2166            5.70s
       819          35.1120            5.67s
       820          34.9897            5.63s
       821          34.8609            5.60s
       822          34.7673            5.57s
       823          34.6475            5.53s
       824          34.6127            5.50s
       825          34.5338            5.47s
       826          34.4548            5.43s
       827          34.4385            5.40s
       828          34.3773            5.36s
       829          34.3646            5.33s
       830          34.3362            5.30s
       831          34.2979            5.27s
       832          34.1914            5.23s
       833          34.1790            5.20s
       834          34.0776            5.16s
       835          33.9618            5.13s
       836          33.8766            5.10s
       837          33.8193            5.06s
       838          33.6672            5.03s
       839          33.5876            5.00s
       840          33.4777            4.97s
       841          33.3927            4.93s
       842          33.2647            4.90s
       843          33.1527            4.87s
       844          33.0205            4.84s
       845          32.9235            4.80s
       846          32.8285            4.77s
       847          32.7425            4.74s
       848          32.6951            4.71s
       849          32.6660            4.67s
       850          32.5655            4.64s
       851          32.4801            4.61s
       852          32.3951            4.57s
       853          32.3400            4.54s
       854          32.2870            4.51s
       855          32.2174            4.48s
       856          32.1503            4.44s
       857          32.0657            4.41s
       858          32.0158            4.38s
       859          31.9390            4.34s
       860          31.9100            4.31s
       861          31.8524            4.28s
       862          31.7961            4.25s
       863          31.7432            4.22s
       864          31.6519            4.18s
       865          31.5790            4.15s
       866          31.5639            4.12s
       867          31.4671            4.09s
       868          31.3659            4.06s
       869          31.3363            4.02s
       870          31.3000            3.99s
       871          31.1804            3.96s
       872          31.1474            3.93s
       873          31.0948            3.89s
       874          31.0788            3.86s
       875          30.9843            3.83s
       876          30.9696            3.80s
       877          30.9129            3.77s
       878          30.8899            3.74s
       879          30.8558            3.70s
       880          30.8006            3.67s
       881          30.7013            3.64s
       882          30.6438            3.61s
       883          30.5413            3.58s
       884          30.5169            3.54s
       885          30.4661            3.51s
       886          30.4457            3.48s
       887          30.3474            3.45s
       888          30.2849            3.42s
       889          30.1955            3.38s
       890          30.1140            3.35s
       891          30.0319            3.32s
       892          29.9482            3.29s
       893          29.9119            3.26s
       894          29.8963            3.22s
       895          29.8112            3.19s
       896          29.7434            3.16s
       897          29.6865            3.13s
       898          29.6764            3.10s
       899          29.6593            3.07s
       900          29.6124            3.04s
       901          29.5498            3.01s
       902          29.4566            2.97s
       903          29.3630            2.94s
       904          29.2683            2.91s
       905          29.2188            2.88s
       906          29.1751            2.85s
       907          29.1173            2.82s
       908          29.0323            2.79s
       909          28.9361            2.76s
       910          28.8543            2.72s
       911          28.7708            2.69s
       912          28.7260            2.66s
       913          28.6268            2.63s
       914          28.5262            2.60s
       915          28.3943            2.57s
       916          28.3001            2.54s
       917          28.2050            2.51s
       918          28.1342            2.48s
       919          28.1116            2.45s
       920          28.0228            2.42s
       921          27.9279            2.38s
       922          27.8416            2.35s
       923          27.7658            2.32s
       924          27.6785            2.29s
       925          27.6202            2.26s
       926          27.5328            2.23s
       927          27.4919            2.20s
       928          27.4607            2.17s
       929          27.3750            2.14s
       930          27.3617            2.11s
       931          27.3330            2.08s
       932          27.2761            2.05s
       933          27.1629            2.02s
       934          27.0954            1.99s
       935          27.0213            1.96s
       936          26.9433            1.93s
       937          26.8709            1.89s
       938          26.8051            1.86s
       939          26.7244            1.83s
       940          26.6764            1.80s
       941          26.6321            1.77s
       942          26.6145            1.74s
       943          26.5278            1.71s
       944          26.4486            1.68s
       945          26.3539            1.65s
       946          26.3082            1.62s
       947          26.2037            1.59s
       948          26.1147            1.56s
       949          26.0612            1.53s
       950          25.9780            1.50s
       951          25.9032            1.47s
       952          25.8200            1.44s
       953          25.7387            1.41s
       954          25.6558            1.38s
       955          25.5928            1.35s
       956          25.5086            1.32s
       957          25.4100            1.29s
       958          25.3567            1.26s
       959          25.3347            1.23s
       960          25.2719            1.20s
       961          25.2647            1.17s
       962          25.1735            1.14s
       963          25.0875            1.11s
       964          24.9945            1.08s
       965          24.9195            1.05s
       966          24.8384            1.02s
       967          24.7877            0.99s
       968          24.7641            0.96s
       969          24.7228            0.93s
       970          24.6919            0.90s
       971          24.6109            0.87s
       972          24.5523            0.84s
       973          24.4956            0.81s
       974          24.4423            0.78s
       975          24.3673            0.75s
       976          24.3140            0.72s
       977          24.2648            0.69s
       978          24.2199            0.66s
       979          24.2152            0.63s
       980          24.1410            0.60s
       981          24.0767            0.57s
       982          24.0493            0.54s
       983          24.0357            0.51s
       984          24.0151            0.48s
       985          23.9303            0.45s
       986          23.9202            0.42s
       987          23.8490            0.39s
       988          23.7701            0.36s
       989          23.7466            0.33s
       990          23.6749            0.30s
       991          23.6458            0.27s
       992          23.6191            0.24s
       993          23.5652            0.21s
       994          23.5566            0.18s
       995          23.5069            0.15s
       996          23.4597            0.12s
       997          23.4124            0.09s
       998          23.3412            0.06s
       999          23.2403            0.03s
      1000          23.1917            0.00s
RMSE of the gradient boosting model on the validation set: 35.11544177065954
In [37]:
# table of feature importance
gbr_imp = [t for t in zip(features, gbr.feature_importances_)]
gbr_imp_df = pd.DataFrame(gbr_imp, columns=['feature', 'varimp'])
gbr_imp_df = gbr_imp_df.sort_values('varimp', ascending=False)
In [38]:
# top 10 feature importances
px.pie(gbr_imp_df.head(10), names='feature', values='varimp', title='Top 10 Feature Importance for Gradient Boosting', hole=.2, template='ggplot2')
49.2%14.5%13.2%4.48%3.68%3.52%3.35%3.15%2.58%2.4%
lag_24hourlag_1lag_2lag_17lag_7lag_23rolling_meanlag_3lag_12Top 10 Feature Importance for Gradient Boosting
plotly-logomark

The lag 24 feature is the most important, followed by the lag 1, among the Gradient Boost model. The RMSE score is 35.12.

XG boost¶

In [39]:
# XGB 
xgbr = XGBRegressor(learning_rate=0.09, n_estimators=800, eval_metric='rmse', random_state=19, max_depth=6, early_stopping_rounds=500)

xgbr.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_valid, y_valid)], verbose=20)

# Make predictions on the test set
predictions_xgbr = xgbr.predict(X_valid)

result = mse(y_valid, predictions_xgbr) ** 0.5 # calculate RMSE on validation set
print()
print("RMSE of the xgbm model on the validation set:", result)
[0]	validation_0-rmse:75.35697	validation_1-rmse:110.75678
[20]	validation_0-rmse:21.06562	validation_1-rmse:42.97821
[40]	validation_0-rmse:14.83272	validation_1-rmse:33.75202
[60]	validation_0-rmse:13.20043	validation_1-rmse:32.41903
[80]	validation_0-rmse:12.03843	validation_1-rmse:32.01924
[100]	validation_0-rmse:11.16971	validation_1-rmse:31.98212
[120]	validation_0-rmse:10.32577	validation_1-rmse:31.87761
[140]	validation_0-rmse:9.64544	validation_1-rmse:31.72762
[160]	validation_0-rmse:8.97175	validation_1-rmse:31.71167
[180]	validation_0-rmse:8.37861	validation_1-rmse:31.73429
[200]	validation_0-rmse:7.92080	validation_1-rmse:31.72392
[220]	validation_0-rmse:7.42729	validation_1-rmse:31.73696
[240]	validation_0-rmse:6.85370	validation_1-rmse:31.72332
[260]	validation_0-rmse:6.51763	validation_1-rmse:31.76170
[280]	validation_0-rmse:6.17613	validation_1-rmse:31.70083
[300]	validation_0-rmse:5.81158	validation_1-rmse:31.70956
[320]	validation_0-rmse:5.53704	validation_1-rmse:31.73578
[340]	validation_0-rmse:5.18261	validation_1-rmse:31.77532
[360]	validation_0-rmse:4.87300	validation_1-rmse:31.75781
[380]	validation_0-rmse:4.63794	validation_1-rmse:31.74164
[400]	validation_0-rmse:4.38308	validation_1-rmse:31.72114
[420]	validation_0-rmse:4.17729	validation_1-rmse:31.73294
[440]	validation_0-rmse:3.93979	validation_1-rmse:31.74836
[460]	validation_0-rmse:3.63146	validation_1-rmse:31.74898
[480]	validation_0-rmse:3.38108	validation_1-rmse:31.76431
[500]	validation_0-rmse:3.14939	validation_1-rmse:31.74403
[520]	validation_0-rmse:3.00472	validation_1-rmse:31.74391
[540]	validation_0-rmse:2.78350	validation_1-rmse:31.75152
[560]	validation_0-rmse:2.62567	validation_1-rmse:31.73731
[580]	validation_0-rmse:2.43262	validation_1-rmse:31.74129
[600]	validation_0-rmse:2.29920	validation_1-rmse:31.73506
[620]	validation_0-rmse:2.14859	validation_1-rmse:31.74324
[640]	validation_0-rmse:2.01485	validation_1-rmse:31.75158
[658]	validation_0-rmse:1.90277	validation_1-rmse:31.74793

RMSE of the xgbm model on the validation set: 31.691194462756986
In [40]:
# table of feature importance
xgbr_imp = [t for t in zip(features, xgbr.feature_importances_)]
xgbr_imp_df = pd.DataFrame(xgbr_imp, columns=['feature', 'varimp'])
xgbr_imp_df = xgbr_imp_df.sort_values('varimp', ascending=False)
In [41]:
# top 10 feature importances
px.pie(xgbr_imp_df.head(10), names='feature', values='varimp', title='Top 10 Feature Importance for XG Boost', hole=.2, template='ggplot2')
53.1%17.5%7.81%3.61%3.51%3.35%3.04%3.01%2.66%2.46%
lag_24hourlag_1lag_7lag_2lag_23lag_12lag_17lag_3rolling_meanTop 10 Feature Importance for XG Boost
plotly-logomark

The lag 24 feature is the most important, followed by the lag 1, among the XG Boost model. The RMSE score is 31.69.

Light GBM¶

In [42]:
# LGBM

# Create a LightGBM dataset
lgb_train = lgb.Dataset(X_train, y_train)
lgb_valid = lgb.Dataset(X_valid, y_valid, reference=lgb_train)

# Define the parameters for the LightGBM model
params = {
    'objective': 'regression',
    'metric': 'root_mean_squared_error',
    'boosting_type': 'gbdt',
    'random_state': 19
}

# Train the LightGBM model
lgbm = lgb.train(params, lgb_train, valid_sets=lgb_valid, num_boost_round=500, early_stopping_rounds=50)

# Make predictions on the test set
predictions_valid_lgbm = lgbm.predict(X_valid)

result = mse(y_valid, predictions_valid_lgbm) ** 0.5 # calculate RMSE on validation set
print()
print("RMSE of the lgbm model on the validation set:", result)
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.002680 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 4548
[LightGBM] [Info] Number of data points in the train set: 3512, number of used features: 29
[LightGBM] [Info] Start training from score 74.420273
[1]	valid_0's rmse: 55.5412
Training until validation scores don't improve for 50 rounds
C:\Users\XIX\anaconda3\lib\site-packages\lightgbm\engine.py:181: UserWarning:

'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.

[2]	valid_0's rmse: 53.2132
[3]	valid_0's rmse: 50.8822
[4]	valid_0's rmse: 48.7882
[5]	valid_0's rmse: 47.0952
[6]	valid_0's rmse: 45.4315
[7]	valid_0's rmse: 44.125
[8]	valid_0's rmse: 42.941
[9]	valid_0's rmse: 41.8438
[10]	valid_0's rmse: 40.8164
[11]	valid_0's rmse: 39.8395
[12]	valid_0's rmse: 38.923
[13]	valid_0's rmse: 38.1671
[14]	valid_0's rmse: 37.5767
[15]	valid_0's rmse: 37.0534
[16]	valid_0's rmse: 36.5322
[17]	valid_0's rmse: 36.0722
[18]	valid_0's rmse: 35.8637
[19]	valid_0's rmse: 35.2914
[20]	valid_0's rmse: 34.9924
[21]	valid_0's rmse: 34.729
[22]	valid_0's rmse: 34.3941
[23]	valid_0's rmse: 34.0822
[24]	valid_0's rmse: 33.7881
[25]	valid_0's rmse: 33.492
[26]	valid_0's rmse: 33.4272
[27]	valid_0's rmse: 33.223
[28]	valid_0's rmse: 33.1671
[29]	valid_0's rmse: 33.0753
[30]	valid_0's rmse: 32.9774
[31]	valid_0's rmse: 32.8236
[32]	valid_0's rmse: 32.7522
[33]	valid_0's rmse: 32.599
[34]	valid_0's rmse: 32.5062
[35]	valid_0's rmse: 32.4323
[36]	valid_0's rmse: 32.3681
[37]	valid_0's rmse: 32.2884
[38]	valid_0's rmse: 32.198
[39]	valid_0's rmse: 32.1224
[40]	valid_0's rmse: 32.1494
[41]	valid_0's rmse: 32.1236
[42]	valid_0's rmse: 32.0699
[43]	valid_0's rmse: 32.0329
[44]	valid_0's rmse: 31.9388
[45]	valid_0's rmse: 31.8187
[46]	valid_0's rmse: 31.7902
[47]	valid_0's rmse: 31.7698
[48]	valid_0's rmse: 31.7703
[49]	valid_0's rmse: 31.8217
[50]	valid_0's rmse: 31.7187
[51]	valid_0's rmse: 31.7468
[52]	valid_0's rmse: 31.7403
[53]	valid_0's rmse: 31.7101
[54]	valid_0's rmse: 31.7397
[55]	valid_0's rmse: 31.707
[56]	valid_0's rmse: 31.7258
[57]	valid_0's rmse: 31.6866
[58]	valid_0's rmse: 31.6803
[59]	valid_0's rmse: 31.6853
[60]	valid_0's rmse: 31.7247
[61]	valid_0's rmse: 31.7116
[62]	valid_0's rmse: 31.7258
[63]	valid_0's rmse: 31.7189
[64]	valid_0's rmse: 31.6419
[65]	valid_0's rmse: 31.6094
[66]	valid_0's rmse: 31.6382
[67]	valid_0's rmse: 31.6472
[68]	valid_0's rmse: 31.6152
[69]	valid_0's rmse: 31.5981
[70]	valid_0's rmse: 31.6063
[71]	valid_0's rmse: 31.6044
[72]	valid_0's rmse: 31.606
[73]	valid_0's rmse: 31.6105
[74]	valid_0's rmse: 31.6097
[75]	valid_0's rmse: 31.6225
[76]	valid_0's rmse: 31.5469
[77]	valid_0's rmse: 31.5437
[78]	valid_0's rmse: 31.5638
[79]	valid_0's rmse: 31.5432
[80]	valid_0's rmse: 31.5412
[81]	valid_0's rmse: 31.5371
[82]	valid_0's rmse: 31.5106
[83]	valid_0's rmse: 31.5192
[84]	valid_0's rmse: 31.5331
[85]	valid_0's rmse: 31.5491
[86]	valid_0's rmse: 31.5806
[87]	valid_0's rmse: 31.5929
[88]	valid_0's rmse: 31.5815
[89]	valid_0's rmse: 31.5635
[90]	valid_0's rmse: 31.5644
[91]	valid_0's rmse: 31.5619
[92]	valid_0's rmse: 31.5618
[93]	valid_0's rmse: 31.5567
[94]	valid_0's rmse: 31.5851
[95]	valid_0's rmse: 31.598
[96]	valid_0's rmse: 31.5681
[97]	valid_0's rmse: 31.5707
[98]	valid_0's rmse: 31.5517
[99]	valid_0's rmse: 31.5489
[100]	valid_0's rmse: 31.566
[101]	valid_0's rmse: 31.5811
[102]	valid_0's rmse: 31.584
[103]	valid_0's rmse: 31.6054
[104]	valid_0's rmse: 31.6333
[105]	valid_0's rmse: 31.6272
[106]	valid_0's rmse: 31.6618
[107]	valid_0's rmse: 31.6606
[108]	valid_0's rmse: 31.6653
[109]	valid_0's rmse: 31.6626
[110]	valid_0's rmse: 31.6922
[111]	valid_0's rmse: 31.7095
[112]	valid_0's rmse: 31.6929
[113]	valid_0's rmse: 31.6874
[114]	valid_0's rmse: 31.6709
[115]	valid_0's rmse: 31.6573
[116]	valid_0's rmse: 31.6559
[117]	valid_0's rmse: 31.6572
[118]	valid_0's rmse: 31.666
[119]	valid_0's rmse: 31.6794
[120]	valid_0's rmse: 31.6797
[121]	valid_0's rmse: 31.608
[122]	valid_0's rmse: 31.6097
[123]	valid_0's rmse: 31.6293
[124]	valid_0's rmse: 31.6424
[125]	valid_0's rmse: 31.6139
[126]	valid_0's rmse: 31.6059
[127]	valid_0's rmse: 31.6064
[128]	valid_0's rmse: 31.6036
[129]	valid_0's rmse: 31.6035
[130]	valid_0's rmse: 31.608
[131]	valid_0's rmse: 31.6322
[132]	valid_0's rmse: 31.6515
Early stopping, best iteration is:
[82]	valid_0's rmse: 31.5106

RMSE of the lgbm model on the validation set: 31.5105670958812
In [43]:
# Get the feature importances of the trained model
lgbm_importances = lgbm.feature_importance()

# Create a dataframe with the feature importances and the corresponding feature names
lgbm_importances_df = pd.DataFrame({'feature':X_train.columns, 'importance':lgbm_importances})

# Sort the dataframe by importance
lgbm_importances_df.sort_values(by='importance', ascending=False, inplace=True)
In [44]:
# top 10 feature importances
px.pie(lgbm_importances_df.head(10), names='feature', values='importance', title='Top 10 Feature Importance for Light GBM Regression', hole=.2, template='ggplot2')
14%12.6%11.8%9.36%9.18%8.9%8.71%8.53%8.53%8.43%
hourlag_1lag_24lag_7lag_17lag_2lag_11lag_12lag_23dayofweekTop 10 Feature Importance for Light GBM Regression
plotly-logomark

The lag 24 feature is the most important, followed by the lag 1, 3, and 17, among the light GB model. The RMSE score is 31.51.

Catboost¶

In [ ]:
# catboost

catb = CatBoostRegressor(task_type='GPU', loss_function='RMSE', eval_metric='RMSE', iterations=2000, random_seed=19, early_stopping_rounds=500)

catb.fit(X_train, y_train, eval_set=(X_valid, y_valid), verbose=100, use_best_model=True, plot=True)

# Make predictions on the test set
predictions_valid_catb = catb.predict(X_valid)

result = mse(y_valid, predictions_valid_catb) ** 0.5 # calculate RMSE on validation set
print()
print("Catboost model on the test set: ")
catb.best_score_
MetricVisualizer(layout=Layout(align_self='stretch', height='500px'))
In [ ]:
# Get the feature importances of the trained model
catb_importances = catb.feature_importances_

# Create a dataframe with the feature importances and the corresponding feature names
catb_importances_df = pd.DataFrame({'feature':X_train.columns, 'importance':catb_importances})

# Sort the dataframe by importance
catb_importances_df.sort_values(by='importance', ascending=False, inplace=True)
In [ ]:
# top 10 feature importances
px.pie(catb_importances_df.head(10), names='feature', values='importance', title='Top 10 Feature Importance for Catboost Regression', hole=.2, template='ggplot2')

The lag 24 feature is the most important, followed by the lag 1, among the catboost model. The RMSE score is 30.42.

Voting Regressor¶

In [ ]:
# Training classifiers
# reg1 = RandomForestRegressor(random_state=12345, n_estimators=90, max_depth=100)
# reg2 = GradientBoostingRegressor(random_state=19, learning_rate=0.2, n_estimators=1000, verbose=1, max_depth=3)
reg3 = XGBRegressor(learning_rate=0.09, n_estimators=800, eval_metric='rmse', random_state=19, max_depth=6) #, early_stopping_rounds=500)
reg4 = lgb.LGBMRegressor(objective='regression', metric='root_mean_squared_error', boosting_type='gbdt', random_state=19) #, early_stopping_rounds=50)
reg5 = CatBoostRegressor(task_type='GPU', loss_function='RMSE', eval_metric='RMSE', iterations=2000, random_seed=19) #, early_stopping_rounds=500)

ereg = VotingRegressor(estimators=[#('rf', reg1), 
                                #('gbr', reg2), 
                                ('xgb', reg3), 
                                ('lgb', reg4), 
                                ('cat', reg5)], 
                                verbose=1)
ereg = ereg.fit(X_train, y_train)

# Make predictions on the test set
predictions_valid_ereg = ereg.predict(X_valid)

result = mse(y_valid, predictions_valid_ereg) ** 0.5 # calculate RMSE on validation set
print()
print("voting regressor model on the valid set: ", result)

The best model we have is the voting regressor. This model combines the three best performing models and parameters: XG boost, Light GB, and Catboost. The RMSE score is 30.78 with the validation set.

Comparing Model Scores¶

In [ ]:
# making model scores dataframe
model_scores = pd.DataFrame({'Linear Regression': 34.28, 'Decision Tree': 38.54, 'Random Forest': 32.01, 'Ada Boost': 34.89, 'Gradient Boost': 35.11, 'XG Boost': 31.69, 'Light GBM': 31.51,
             'Catboost': 30.42, 'Voting Regressor': 30.78}, index={'RMSE'})
model_scores = model_scores.T
In [ ]:
# Model RMSE scores
px.scatter(model_scores, title='Model RMSE Scores', template='ggplot2', color=model_scores.index, size='RMSE', y='RMSE', size_max=30, labels={'index': 'Model'})

This figure compares the RMSE scores of the various models. The three best performing models are the Gradient boost, XG boost, and Catboost. Finally, the Voting regressor achieved a great score.

Final Model¶

In [ ]:
# combine validation set with training set
X_full = pd.concat([X_train, X_valid])
y_full = pd.concat([y_train, y_valid])
In [ ]:
# Training classifiers
# reg1 = RandomForestRegressor(random_state=12345, n_estimators=90, max_depth=100)
# reg2 = GradientBoostingRegressor(random_state=19, learning_rate=0.2, n_estimators=1000, verbose=1, max_depth=3)
reg3 = XGBRegressor(learning_rate=0.09, n_estimators=800, eval_metric='rmse', random_state=19, max_depth=6) #, early_stopping_rounds=500)
reg4 = lgb.LGBMRegressor(objective='regression', metric='root_mean_squared_error', boosting_type='gbdt', random_state=19) #, early_stopping_rounds=50)
reg5 = CatBoostRegressor(task_type='GPU', loss_function='RMSE', eval_metric='RMSE', iterations=2000, random_seed=19) #, early_stopping_rounds=500)

final = VotingRegressor(estimators=[#('rf', reg1), 
                                #('gbr', reg2), 
                                ('xgb', reg3), 
                                ('lgb', reg4), 
                                ('cat', reg5)], 
                                verbose=1)
final = final.fit(X_full, y_full)

# Make predictions on the test set
final_predictions = final.predict(X_test)

result = mse(y_test, final_predictions) ** 0.5 # calculate RMSE on validation set
print()
print("voting regressor model on the test set: ", result)

The final model we chose was the voting regressor. We achieved a RMSE score of 41.24 with the test set. This model performs better than the required RMSE score of 48. Therefore, we have accurately predicted the future number of orders.

Overall Conclusions¶

Overall, we succeeded in providing a model for Sweet Lift Taxi to predict the number of orders of the next hour. The target metric for our model was an RMSE score under 48. Our final model was a voting regressor, with a final RMSE of 46.47 with the test data set. Therefore, Sweet Lift can accommodate drivers with a model that accurately predicts future number of orders.